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presented  herein  identify  PCAT-1  as  a  novel  transcriptional  repressor  implicated  in  subset  of  prostate  cancer  patients. 
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INTRODUCTION 

In  the  2010-2011  funding  period,  I  have  been  in  the  research  phase  of  the  M.D.,  Ph.D.  Medical 
Scientist  Training  Program  at  the  University  of  Michigan.  My  research  focuses  on  the  molecular 
basis  of  prostate  cancer  and  especially  emphasizes  the  transformative  role  of  novel  technologies 
in  understanding  this  disease.  The  goal  of  my  research  is  to  elucidate  the  molecular  mechanisms 
underlying  prostate  cancer  and  to  translate  these  findings  into  novel  molecular  diagnostics  or 
therapies  for  prostate  cancer.  The  scope  of  the  grant  applies  to  the  use  of  innovative 
technologies  and  techniques  to  define  new  molecular  aspects  of  disease,  with  a  particular  focus 
on  the  molecular  basis  of  aggressive,  lethal  prostate  cancer. 

BODY 

Training  program 

Over  the  past  year,  the  Department  of  Defense  has  supported  my  research  efforts  as  a  graduate 
student  in  the  research  phase  of  the  M.D.,  Ph.D.  Medical  Scientist  Training  Program  at  the 
University  of  Michigan.  My  training  program  as  a  student  in  the  Department  of  Pathology 
includes  weekly  seminar  series  that  feature  presentations  by  students  as  well  as  faculty  and 
distinguished  guest  lecturers.  My  attendance  at  these  seminars  is  required,  and  I  have  also 
presented  my  data  at  this  forum.  The  Department  of  Pathology  also  has  an  annual  research 
Symposium,  where  students  present  posters  and  attend  talks  by  faculty  and  invited  guests.  The 
University  of  Michigan  Cancer  Center  also  holds  an  annual  Research  Symposium,  at  which  I 
presented. 


Mentorship 

My  research  program  is  guided  by  my  thesis  mentor,  Dr.  Aral  Chinnaiyan.  Dr.  Chinnaiyan  is  a 
Professor  of  Pathology  and  Urology,  a  Howard  Hughes  Medical  Institute  Investigator,  a  Doris 
Duke  Clinical  Scholar,  an  American  Cancer  Society  Investigator,  and  a  Taubman  Scholar  at  the 
University  of  Michigan.  Dr.  Chinnaiyan  provides  a  superb  environment  in  which  to  learn 
science.  He  has  insightful  comments  and  keen  expertise  on  prostate  cancer  research.  Dr. 
Chinnaiyan  has  been  instrumental  in  teaching  me  how  to  design  and  execute  experiments, 
interpret  data,  and  write  up  research  reports.  I  meet  with  Dr.  Chinnaiyan  regularly  and  have 
frequent  communication  with  him.  I  also  gain  guidance  from  the  numerous  other  prostate  cancer 
researchers  in  Dr.  Chinnaiyan’ s  lab,  including  junior  faculty,  post-docs,  and  other  graduate 
students,  with  whom  I  interact  daily.  Finally,  my  thesis  committee  provides  regular  feedback 
about  my  work  in  formal  meetings  as  well  as  in  informal  settings  and  email  communications. 

My  thesis  committee  members  have  been  supportive  and  highly  helpful. 

Conferences 

I  have  been  fortunate  to  attend  multiple  conferences  during  the  course  of  the  2010-2011  funding 
period.  I  attended  and  gave  an  oral  presentation  at  the  annual  DoD  Prostate  Cancer  Research 
Program  IMPaCT  Conference  in  March  2011.  I  also  attended  and  presented  a  poster  at  the 
SPORE  Prostate  Cancer  Program  Retreat  in  March  2011.  I  gave  an  oral  presentation  at  the 
American  Association  for  Cancer  Research  (AACR)  Annual  Meeting  in  April  2011.  I  presented 
a  poster  at  the  Keystone  Symposium,  The  Changing  Landscape  of  the  Cancer  Genome,  in  June 
2011. 
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Mentorship  experiences 

Over  the  past  year,  I  have  worked  closely  to  mentor  undergraduate  students  and  other  trainees  in 
the  Chinnaiyan  laboratory.  I  have  specifically  mentored  two  undergraduate  students  and  2  PhD 
rotation  students  during  their  rotations  through  the  lab.  These  experiences  have  been  extremely 
valuable  in  helping  me  develop  my  skill  and  comfort  with  mentoring  other  emerging  scientists. 

Honors/Awards 

I  have  received  several  awards  in  the  past  year.  My  poster  presentation  at  the  University  of 
Michigan  Cancer  Research  Symposium  received  an  honorable  mention.  My  poster  presentation 
at  the  SPORE  Prostate  Cancer  Program  Retreat  received  first  prize.  I  was  awarded  the  AACR- 
Aflac  Incorporated  Scholar-in-Training  Award  for  the  2011  AACR  Annual  Conference. 

Research  Summary 

The  discovery  of  numerous  non-coding  RNA  (ncRNA)  transcripts  in  species  from  yeast  to 
mammals  has  dramatically  altered  our  understanding  of  cell  biology,  especially  disease  biology 
such  as  cancer.  In  humans,  the  identification  of  abundant  long  ncRNA  (IncRNAs)  >200  bp  in 
length  has  catalyzed  their  characterization  as  critical  components  of  cancer  biology.  Recently, 
roles  for  IncRNAs  as  drivers  of  tumor  suppressive  and  oncogenic  functions  have  appeared  in 
prevalent  cancer  types,  such  as  breast  and  prostate  cancer. 

High-throughput  sequencing  of  polyA+  RNA  (RNA-Seq)  in  human  cancer  shows  remarkable 
potential  to  identify  both  novel  disease-specific  markers  for  clinical  uses  and  uncharacterized 
aspects  of  tumor  biology,  particularly  non-coding  RNA  (ncRNA)  species.  To  illustrate  this 
approach,  we  employed  RNA-Seq  on  a  cohort  of  102  prostate  tissues  and  cells  lines.  We  found 
that  aberrant  expression  profiles  of  novel  tissue-specific  ncRNAs  distinguished  benign, 
cancerous,  and  metastatic  tumors,  and  we  defined  a  core  set  of  121  novel  ncRNAs  whose 
dysregulation  characterizes  prostate  cancer.  Among  these,  a  novel  prostate-cancer  specific 
ncRNA  (termed  PCAT-1 )  defined  a  subset  of  aggressive  cancers  with  low  expression  of  the 
epigenetic  regulator  EZH2 ,  a  component  of  the  Polycomb  Repressive  Complex  2  (PRC2) 
commonly  upregulated  in  metastatic  cancers.  In  vitro  chromatin  immunoprecipitation,  RNA 
immunoprecipitation,  and  drug  treatment  assays  for  core  PRC2  genes  indicated  that  the  PRC2 
complex  directly  binds  and  represses  PCAT-1,  and  that  PCAT-1  transcript  reciprocally  binds 
PRC2.  By  contrast,  in  vitro  models  with  high  levels  of  endogenous  PCAT-1  transcript  did  not 
recapitulate  PRC2-mediated  repression,  and  in  these  cells  siRNA-mediated  knockdown  of 
PCAT-1  showed  a  25  -  50%  decrease  in  cell  proliferation.  Using  gene  expression  arrays,  we 
determined  that  PCAT-1  contributes  to  the  transcriptional  regulation  of  genes  in  several  key 
biological  processes,  including  cell  cycle.  These  data  suggest  that  PCAT-1  exhibits  two 
biological  states:  a  PRC2 -repressed  state  and  an  active  state  that  promotes  proliferation. 

Next,  we  showed  that  novel  ncRNAs  may  serve  a  clinical  purpose  for  the  non-invasive  detection 
and  stratification  of  prostate  cancer  patients.  We  performed  qPCR  on  patient  urine  samples 
(n=230)  and  found  that  a  custom  ncRNA  expression  signature,  which  includes  PCAT-1,  both 
diagnosed  prostate  cancer  effectively  and  yielded  prognostic  information.  Indeed,  a  high  ncRNA 
expression  signature  value  correlated  with  high-grade  histology  (Gleason  score  >7  vs.  Gleason 
score  <  6;  p=  0.01).  Taken  together,  the  findings  presented  herein  establish  the  utility  of  RNA- 
Seq  to  comprehensively  identify  unannotated  ncRNAs,  such  as  PCAT-1,  implicated  in  cancer. 
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Our  data  suggest  that  PCAT-1  promotes  cell  proliferation,  that  in  its  inactive  state  PCAT-1  is 
mechanistically  repressed  by  PRC2,  and  that  PCAT-1  may  serve  as  a  candidate  biomarker  for 
non-invasive  clinical  tests.  We  further  speculate  that  applying  these  methodologies  to  other 
diseases  may  reveal  key  aspects  of  disease  biology  and  clinically  important  biomarkers, 
particularly  for  diseases  that  currently  lack  good  non-invasive  tests  in  fluids  such  as  blood  serum 
or  urine. 

The  discovery  of  PCAT-1  highlights  the  power  of  unbiased  transcriptome  studies  to  explore  a 
rich  set  of  IncRNAs  associated  with  cancer.  While  PCAT-1  is  the  first  cancer  IncRNA  to  be 
discovered  by  this  method,  we  anticipate  that  many  additional  studies  will  employ  this  approach. 

KEY  RESEARCH  ACCOMPLISHMENTS 

•  Defined  the  landscape  of  SOX4  expression  across  prostate  cancer  progression  and 
disease  subtypes  (Figure  1). 

•  Defined  SOX4  as  an  androgen-repressed  gene  (Figures  2  and  3). 

•  Determined  global  gene  expression  signatures  associated  with  SOX4  and  demonstrated 
that  SOX4  knockdown  results  in  increased  E-cadherin  mRNA  levels  (Figure  4). 

•  Defined  novel  RNA  transcripts  associated  with  SOX4  and  prostate  cancer  (Figure  5). 

•  Functionally  characterized  novel  RNA  transcripts  as  functional  molecules  in  prostate 
cancer  progression  (see  appended  manuscript,  Prensner  et  al.  Nature  Biotechnology 
2011). 

•  Defined  one  novel  RNA,  named  PCAT-1,  as  a  regulator  of  cell  proliferation  through 
transcriptional  repressor  of  target  genes.  PCAT-1  is  itself  regulated  by  the  Polycomb 
Repressive  Complex  2  (see  appended  manuscript,  Prensner  et  al.  Nature  Biotechnology 
2011). 

•  Evaluated  the  potential  for  novel  RNA  transcripts  to  be  utilized  as  novel  prostate  cancer 
diagnostics  detectable  in  prostate  cancer  patient  urine  (see  appended  manuscript, 

Prensner  et  al.  Nature  Biotechnology  2011). 


REPORTABLE  OUTCOMES 

Publications 

•  Cao  Q,  Mani  RS,  Ateeq  B,  Dhanasekaran  SM,  Asangani  IA,  Prensner  JR,  Kim  JH, 
Brenner  JC,  Jing  X,  Cao  X,  Wang  R,  Li  Y,  Dahiya  A,  Wang  L,  Pandhi  M,  Lonigro  RJ, 
Wu  YM,  Tomlins  SA,  Palanisamy  N,  Qin  Z,  Yu  J,  Maher  CA,  Varambally  S,  Chinnaiyan 
AM.,  Coordinated  Regulation  of  Polycomb  Group  Complexes  through  microRNAs  in 
Cancer.  Cancer  Cell  2011  Aug  16;20(2):  187-99;  PMID:  21840484 

•  Prensner  JR,  Iyer  MK,  Balbin  OA,  Dhanasekaran  SM,  Cao  Q,  Brenner  JC,  Laxman  B, 
Asangani  IA,  Grasso  CS,  Kominsky  HD,  Cao  X,  Jing  X,  Wang  X,  Siddiqui  J,  Wei  JT, 
Robinson  D,  Iyer  HK,  Palanisamy  N,  Maher  CA,  Chinnaiyan  AM.  Transcriptome 
sequencing  across  a  prostate  cancer  cohort  identifies  PCAT-1,  an  unannotated  lincRNA 
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implicated  in  disease  progression.  Nature  Biotechnology  20 11  Jul  31;29(8):742-9.  doi: 
10.1038/nbt.l914  PMID:  21804560 

•  Kim  JH,  Dhanasekaran  SM,  Prensner  JR,  Cao  X,  Robinson  D,  Kalyana-Sundaram  S, 
Huang  C,  Shankar  S,  Jing  X,  Iyer  M,  Hu  M,  Sam  L,  Grasso  C,  Maher  CA,  Palanisamy  N, 
Mehra  R,  Kominsky  HD,  Siddiqui  J,  Yu  J,  Qin  ZS,  Chinnaiyan  AM.  Deep  sequencing 
reveals  distinct  patterns  of  DNA  methylation  in  prostate  cancer.  Genome  Research. 

2011  Jul;2 1(7):  1028-41.  PMID:  21724842 

•  Wang  XS,  Shankar  S,  Dhanasekaran  SM,  Ateeq  B,  Sasaki  A,  Jing  X,  Robinson  D,  Cao 
Q,  Prensner  J,  Yocum  A,  Wang  R,  Fries  D,  Han  B,  Asangani  I,  Cao  X,  Li  Y,  Omenn  G, 
Pflueger  D,  Gopalan  A,  Reuter  V,  Kahoud  ER,  Cantley  L,  Rubin  M,  Palanisamy  N, 
Varambally  S,  Chinnaiyan  AM.  Characterization  of  KRAS  Rearrangements  in  Metastatic 
Prostate  Cancer.  Cancer  Discovery  2011;  1(1):  OF33-41. 

•  Prensner  JR,  Chinnaiyan  AM.  Metabolism  unhinged:  IDH  mutations  in  cancer.  Nature 
Medicine.  20 1 1  Mar;  1 7(3):29 1  -3 

Presentations 

•  Keystone  Symposium,  The  Changing  Landscape  of  the  Cancer  Genome  (June  2011); 
poster  presentation 

•  AACR  Annual  Meeting  (April  2011);  oral  presentation 

•  SPORE  Prostate  Cancer  Program  Retreat  (March  2011);  poster  presentation 

•  Prostate  Cancer  Research  Program  (PCRP)  IMPaCT  conference  (March  2011);  poster 
presentation 

•  University  of  Michigan  Cancer  Research  Symposium  (November  2011);  poster 
presentation 

•  University  of  Michigan  Pathology  Research  Symposium(November  2011);  poster 
presentation 

Awards 

•  AACR- Aflac  Incorporated  Scholar-in-Training  Award,  AACR  (April  2011) 

•  SPORE  Prostate  Cancer  Program  Retreat,  First-prize  poster  award  (March  2011) 
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CONCLUSION 

The  work  funded  by  this  project  establishes  the  efficacy  of  RNA  profiling  to  elucidate  the 
molecular  basis  of  prostate  cancer.  Through  the  profiling  of  patient  tumor  RNA  we  have  no  only 
nominated  SOX4  as  a  prostate  cancer  gene,  but  we  have  also  examined  a  co-regulated 
transcriptional  network  of  novel  RNA  transcripts  also  associated  with  prostate  cancer.  We  have 
characterized  PCAT-1  as  a  novel  noncoding  RNA  upregulated  in  prostate  cancer,  and  we  have 
determined  that  detection  of  ncRNAs  in  patient  urine  may  be  a  promising  avenue  of  non-invasive 
biomarkers.  PCAT-1  drives  cell  proliferation  and  represses  key  target  genes  to  achieve  its  effect. 
Future  work  would  benefit  from  profiling  non-polyadenylated  RNA  species  as  well,  since  these 
also  likely  have  role  in  prostate  cancer  progression.  In  summary,  this  work  has  discovered  new 
genes  and  characterized  their  functions  in  prostate  cancer.  This  work  therefore  expands  our 
knowledge  and  understanding  of  this  disease,  as  well  as  nominating  novel  biomarkers  detectable 
in  patient  urine  samples. 
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Figure  1:  Nomination  of  SOX4  as  a  metastasis-associated  gene.  (A)  RNA-Seq  data  from  a 
cohort  of  prostate  cancer  patient  samples  (benign,  n=6;  localized  cancer,  n=23;  metastases, 
n=20).  SOX4  shows  elevated  mRNA  levels  in  localized  cancer  and  substantially  elevated 
mRNA  levels  in  metastases.  (B)  SOX4  levels  in  individual  samples  shows  elevated  mRNA 
levels  in  metastases.  SOX4  expression  is  correlated  with  EZH2  expression,  which  is  also 
elevated  in  metastases. 
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Figure  2:  SOX4  is  repressed  by  androgen  stimulation.  VCAP  cells  were  starved  of  androgens 
for  48  hours  and  then  stimulated  with  lOnM  of  R1881,  a  synthetic  androgen.  A  time-course 
analysis  of  RNA  expression  shows  decreasing  SOX4  mRNA  levels  following  induction  of 
androgen  signaling. 
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Figure  3:  The  SOX4  locus  is  bound  by  the  androgen  receptor  and  ERG.  ChIP-Seq  data  for 
LNCaP  and  VCaP  prostate  cancer  cells  was  performed  for  AR  (in  both  cell  lines)  or  ERG  (in 
VCaP).  AR  ChIP-Seq  was  performed  with  both  androgen-depleted  and  androgen-stimulated 
(R1881)  conditions.  AR,  as  well  as  ERG,  binds  the  genomic  locus  of  S0X4  upstream  of  the 
gene  transcriptional  start  site. 
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Figure  4:  SOX4  is  associated  with  metastasis  gene  signatures  and  represses  E-cadherin 
mRNA  levels.  (A)  Analysis  of  SOX4  expression  levels  using  the  Oncomine  database  nominates 
SOX4  as  a  key  prostate  cancer  metastasis  outlier  gene.  Analysis  of  SOX4  in  6  prostate  cancer 
metastasis  datasets  shows  that  SOX4  is  a  commonly  upregulated  gene  in  prostate  cancer 
metastases.  (B)  Knockdown  of  SOX4  in  the  VCaP  cell  line  results  in  upregulation  of  E-cadherin 
mRNA  levels. 


9 


Prensner,  John  (W81XWH-10-1-0551) 


O  kOOOkD 
O  fSJOOfVJ 


21. 


rsioovo  o 

miar\i  rsj  tj- 

TJ  I'D  TJ  ONT3 

r-O  r-  r-  mfO  r 

4-»  00*-» 

u^r\j  ^  oo  vy* 


-h  r^rvjo 

m  inolj 

mNT3l^  *D  I  I 


mcnr^-HOkomcxi  r\j 
rn  r-j  kr.  r\j  w  - 


.... -  _  .  . ,  I  I  I  I  I'D  I  I 

mfSJr-f\J  r-oom  OOOOOr-OO 


_  i;  ji;  ji;  h;  ji;  j  ir-j  r -j 


voo  m 
r  J  r  J 

■om  I 
r-^w  O 

4-»f\J  IX) 


I  loom  IfM 

r^o-H  olt.^  or- 

ONCO  INJCOO  coco 

rr\nir>  irifNj 

■D-D-D  kO  'D  "D  *D  -D-D 
f-f-r-  f\J  -r—  *r—  t-  r-r- 
04-»4-»4-»  4-»4-» 

0  1/^1^ 

u  u  ucn  iduu  um  u  u 

- - - -s  E^J*  E  E 


LP|/|LLVlP4-'LU*J*JS-S_*J<JUWU*JU*JS-i-LMJ*Jl/)4-'PLLUi-*JUI/H/)l-*JVM/1WWWUI/Un*JU3<Jl/l*J^<JUULM-'S-UUU<-'UU 
_jo.id  di q. o. di fd  id  o.*  id  id  o.o.id  id  *  01*  id  *  id  Q.Q.*  id  id  raid  id  Q.Q.*  o.id  *  oio»o.id  d^didioidi*  oioiid  *  oiid  oiid  id  id  *  *  *  id  o.*  *  *  id  *  * 


J.  T5i.  •  1  .  /  r  -  ■ 

fJ5&iilrSfr-cj 


^  -  ■*■  sas 

.  L  ■  ■  ■ 


nil  ^ 

■ 

■  ■ 

_  ■ _ 

i 

■ 

tmu 

■  ■  _ 

■ 

A  •  W 


V 

E  1  ■ 

■i  ! 

■■■ 

■  ■■ 

■  1 

■  "  ; l  v .  f ■  Ai 


J-  .  ■ 


jjays&s 

h  vrA  I 


> 

r 

I 


r.  fi.  i  jy 


chrlO: 67007914-67008181 
chrZ: 38255377-38264148 
chr3: 125927176-125928985 
chrX: 79483907 -79 590882 
chrl4: 62572242-62579255 
chrl: 161259296-161262003 
chrl 5: 37394276-37402288 
ch  r2 : 86 1 4 1 60 1 -86 143559 
chr3: 111903398-111905002 
chrl: 11364984-11376956 
chr6: 113938100-113971460 
chrl3: 677751 51 -67779058 
chr6: 123386184-123388039 
chr2 1:298 11 587 -30 194278 
chrl 1 : 107608843-10760941 1 
chrll: 59845618-59845983 
chr2: 159281265-159281975 
chrl4: 209961 52-20996814 
chrl4: 51902860-51903744 
chrl: 153352208-153352774 
chrX: 40936099-40937196 
chr8: 57431219-57472127 
chr8: 57407523-57408860 
chrl3: 67779200-67780476 
chr3: 808 1006 5 -808 197 53 
chr2: 24166757-24167214 
chr7: 22921444-22922097 
chr2 : 208922114-208922387 
ch  r 1 : 20 1 947742 -20 1 948 1 47 
chr2: 139036487-139036730 
chrl7: 10162036-10162580 
chr5: 84760558-84761 134 
chr5: 13638055-13638359 
chrlO: 73949733-73950426 
chr8: 34180706-34181 167 
chr5: 78806218-78806699 
chrl: 185195605-185196936 
chrl: 154350660-154351057 
chrl4: 52584347-52584899 
chrlO: 69634084-69634618 
chr2: 122416000-122416524 
chrX: 140233138-140233372 
chr2: 47568545-47569454 
chr22: 484091 15 -4841 3308 
chr6: 23451610-23452472 
chr7: 29441956-29442767 
chr20: 48346006 -4834639 5 
chrl 5: 91 177448-91 178589 
chr21: 36972008-36972473 
chr5: 68462698-684741 15 
chr9: 97321012-97330161 
chr9: 97316788-97333213 
chrl 5: 70132400-70138262 
chr2: 289451 58-28973917 
chr4: 11432810-11433513 
chrl : 143404428-143405201 
chrX: 130830140-130831685 
chrlO: 105718270-105726782 
chr3: 15790724-15791496 
chr8: 125370080-125370765 
chr7: 157350795-1 57352084 
chrl: 115642225-115645347 
chrl: 39639985-39641 530 
ch  r 1 2 : 1 2277  5874-1 22776607 
chr4: 159166476-159167314 
chr4: 87992433-87993321 
chr4: 87994731 -87995446 
chr2: 61 586921 -61 587429 
chr6: 148818870-148819582 
chrlO: 65041 568-65042041 
chrlO: 6517171 5-65172272 
chr5: 124068932-124069377 
chrl3: 47300165-47301 103 
chrl: 86020013-86020664 
chr7: 29479765-29480272 
chr4: 113486397-113486656 
chrl: 150384727-1 50385403 
chr20:20586237-20586561 
chr8: 19607443-19608046 
chrl 5: 72861231 -72862209 
chrl7 : 35497549-35498390 
chr9: 9731 5364-97317180 
chr8: 128836819-128837396 
chrl3: 99068293-99069695 
chrl: 100535643-100538139 
ch  r 7 : 1 0  5027240 - 1 0  5028376 
chrl2: 116602774-1 16604879 
chrlO: 652041 17-65207771 
chr9: 14247647-14249618 
chr4: 140760543-140765138 
chr3: 73 5364 5 5 -73 54 1360 
chrl 1 : 1 13989173-1 13991797 
chr5: 54794809-54795141 
chrl2: 32495746-3249851 5 
chr6: 160138132-160142117 
chrl 5: 69903400-69905161 
chr2: 159947640-159948170 
chr6: 85997898-85998769 
chr4: 40021833-40023227 
chr6: 31487386-31487920 
chrl4: 6888821 5-68888947 
chrX: 73251221 -73251787 
chrX: 9941 1136-9941 1441 
chr4: 43412065-43412292 
chr2: 219862765-219866362 
chr9: 110182056-110183501 
chr3: 172308740 -1723 132 59 


Figure  5:  SOX4  is  associated  with  novel  unannotated  transcripts.  Prostate  cancer  samples 
analyzed  by  RNA-Seq  were  segregated  according  to  SOX4  expression  (SOX4  high  vs.  SOX4 
low).  Clustering  of  SOX4  with  RNA-Seq  predictions  for  unannotated  IncRNA  transcripts 
reveals  a  signature  of  novel  transcripts  both  correlated  and  anti-correlated  with  SOX4.  The 
samples  in  the  heatmap  above  is  ranked  according  to  increasing  SOX4  expression,  where 
samples  on  the  right  side  of  the  plot  have  high  SOX4  expression. 
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Transcriptome  sequencing  across  a  prostate  cancer 
cohort  identifies  PCAT-1,  an  unannotated  lincRNA 
implicated  in  disease  progression 

John  R  Prensner1,8,  Matthew  K  Iyer1,8,  O  Alejandro  Balbin1,  Saravana  M  Dhanasekaran1,2,  Qi  Cao1, 

J  Chad  Brenner1,  Bharathi  Laxman3,  Irfan  A  Asangani1,  Catherine  S  Grasso1,  Hal  D  Kominsky1,  Xuhong  Cao1, 
Xiaojun  Jing1,  Xiaoju  Wang1,  Javed  Siddiqui1,  John  T  Wei4,  Daniel  Robinson1,  Hari  K  Iyer5, 

Nallasivam  Palanisamy1,2,6,  Christopher  A  Maher1,2  &  Arul  M  Chinnaiyan1,2, 4,6,7 


Noncoding  RNAs  (ncRNAs)  are  emerging  as  key  molecules  in  human  cancer,  with  the  potential  to  serve  as  novel  markers  of  disease 
and  to  reveal  uncharacterized  aspects  of  tumor  biology.  Here  we  discover  121  unannotated  prostate  cancer-associated  ncRNA 
transcripts  (PCATs)  by  ab  initio  assembly  of  high-throughput  sequencing  of  polyA+  RNA  (RNA-Seq)  from  a  cohort  of  102  prostate 
tissues  and  cells  lines.  We  characterized  one  ncRNA,  PCAT-1 ,  as  a  prostate-specific  regulator  of  cell  proliferation  and  show  that  it 
is  a  target  of  the  Polycomb  Repressive  Complex  2  (PRC2).  We  further  found  that  patterns  of  PCAT-1  and  PRC2  expression  stratified 
patient  tissues  into  molecular  subtypes  distinguished  by  expression  signatures  of  PCA T- J-repressed  target  genes.  Taken  together, 
our  findings  suggest  that  PCAT-1  is  a  transcriptional  repressor  implicated  in  a  subset  of  prostate  cancer  patients.  These  findings 
establish  the  utility  of  RNA-Seq  to  identify  disease-associated  ncRNAs  that  may  improve  the  stratification  of  cancer  subtypes. 


Recently,  RNA-Seq  has  provided  a  method  to  delineate  the  entire 
set  of  transcriptional  aberrations  in  a  disease,  including  novel  tran¬ 
scripts  not  measured  by  conventional  analyses1-5.  To  facilitate  inter¬ 
pretation  of  sequence  read  data,  existing  computational  methods 
typically  process  individual  samples  using  either  short  read  gapped 
alignment  followed  by  ab  initio  reconstruction2,3  or  de  novo  assembly 
of  read  sequences  followed  by  sequence  alignment4,5.  These  meth¬ 
ods  provide  a  powerful  framework  to  uncover  uncharacterized  RNA 
species,  including  antisense  transcripts,  short  RNAs  <250  bp  or  long 
intergenic  ncRNAs  (lincRNAs)  >250  bp. 

Although  still  largely  unexplored,  ncRNAs,  particularly  lincRNAs, 
have  emerged  as  a  new  aspect  of  biology,  with  evidence  suggesting  that 
they  are  frequently  cell- type  specific,  contribute  important  functions 
to  numerous  systems6,7  and  may  interact  with  known  cancer  genes 
such  as  EZH2  (ref.  8).  Indeed,  several  well-described  examples,  such 
as  HOTAIR 8,9  and  ANRIL10,11,  indicate  that  ncRNAs  maybe  essential 
actors  in  cancer  biology  typically  facilitating  epigenetic  gene  repres¬ 
sion  through  chromatin -modifying  complexes12,13.  Moreover,  ncRNA 
expression  may  confer  clinical  information  about  disease  outcomes 
and  have  utility  as  diagnostic  tests9,14.  The  characterization  of  RNA 
species,  their  functions  and  their  clinical  applicability  is  therefore  a 
major  area  of  biological  and  clinical  importance. 

Here,  we  describe  a  comprehensive  analysis  of  lincRNAs  in  102 
prostate  cancer  tissue  samples  and  cell  lines  by  RNA-Seq.  We  apply 


ab  initio  computational  approaches  to  delineate  the  annotated  and 
unannotated  transcripts  in  this  disease,  and  we  find  121  ncRNAs, 
termed  PCATs,  whose  expression  patterns  distinguish  benign,  local¬ 
ized  cancer  and  metastatic  cancer  samples.  Notably,  we  discover 
PCAT-1,  a  previously  undescribed  prostate  cancer  ncRNA  that  demon¬ 
strates  either  repression  by  PRC2  or  an  active  role  in  promoting  cell 
proliferation  through  transcriptional  regulation  of  target  genes.  To 
our  knowledge,  our  findings  describe  the  first  comprehensive  study 
of  lincRNAs  in  prostate  cancer,  provide  a  computational  framework 
for  large-scale  RNA-Seq  analyses  and  describe  PCAT-1  as  a  prostate 
cancer  ncRNA  functionally  implicated  in  disease  progression. 

RESULTS 

RNA-Seq  analysis  of  the  prostate  cancer  transcriptome 

Over  two  decades  of  research  have  generated  a  genetic  model  of 
prostate  cancer  based  on  numerous  neoplastic  events,  such  as  loss 
of  the  PTEN 15  tumor  suppressor  gene  and  gain  of  oncogenic  ETS 
family  transcription  factor  gene  fusions16-18  in  large  subsets  of  pros¬ 
tate  cancer  patients.  As  some  patients  lack  these  genetic  aberrations, 
we  hypothesized  that  prostate  cancer  similarly  harbored  disease- 
associated  ncRNAs  that  characterized  specific  molecular  subtypes. 

To  pursue  this  hypothesis,  we  applied  transcriptome  sequencing 
on  a  cohort  of  102  prostate  tissues  and  cell  lines — 20  benign  adjacent 
prostates  (benign),  47  localized  prostate  cancers  (PCA),  14  metastatic 
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Figure  1  Analysis  of  transcriptome  data  for 
the  detection  of  unannotated  transcripts. 

(a)  Schematic  overview  of  the  methodology 
employed  in  this  study,  (b)  Graphical 
representation  of  the  bioinformatics  filters  used 
to  merge  individual  transcriptome  libraries  into 
a  single  consensus  transcriptome.  The  merged 
consensus  transcriptome  was  generated  by 
compiling  all  individual  transcriptome  libraries 
and  using  individual  decision  tree  classifiers 
for  each  chromosome  to  define  high-confidence 
‘expressed’  transcripts  and  low-confidence 
‘background’  transcripts,  which  were  discarded. 
The  example  decision  tree  on  the  left  was 
trained  on  transcripts  on  chromosome  1.  The 
graphics  on  the  right  illustrate  the  application 
of  the  informatics  filtration  pipeline  to  sample 
assembled  transcripts,  (c)  After  informatic 
processing  and  filtration  of  the  sequencing 
data,  transcripts  were  categorized  to  identify 
unannotated  ncRNAs.  Transcribed  pseudogenes 
were  isolated,  and  the  remaining  transcripts  were 
categorized  based  on  overlap  with  an  aggregated 
set  of  known  gene  annotations  into  annotated 
protein  coding,  noncoding  and  unannotated. 

Both  annotated  and  unannotated  ncRNA 
transcripts  were  then  separated  into  intronic, 
intergenic  and  antisense  categories  based  on 
their  relationship  to  protein-coding  genes. 


tumors  and  21  prostate  cell  lines.  From  a  total 
of  1.723  billion  sequence  fragments  from 
201  lanes  of  sequencing  (108  paired-end 
and  93  single  reads  on  the  Illumina  Genome 
Analyzer  and  Genome  Analyzer  II),  we  per¬ 
formed  short-read  gapped  alignment19  and 
recovered  1.41  billion  mapped  reads,  with 
a  median  of  14.7  million  mapped  reads  per 
sample  (Supplementary  Table  1).  We  used  the 
Cufflinks  ab  initio  assembly  approach3  to  pro¬ 


duce,  for  each  sample,  the  most  probable  set  of 

putative  transcripts  that  served  as  the  RNA  templates  for  the  sequence 
fragments  in  that  sample  (Fig.  la  and  Supplementary  Figs.  1  and  2). 

As  expected  from  a  large  tumor  tissue  cohort,  individual  transcript 
assemblies  may  have  sources  of  noise,  such  as  artifacts  of  the  sequence 
alignment  process,  unspliced  intronic  pre-mRNA  and  genomic  DNA 
contamination.  To  exclude  these  from  our  analyses,  we  trained  a  deci¬ 
sion  tree  to  classify  transcripts  as  expressed  versus  background  on 
the  basis  of  transcript  length,  number  of  exons,  recurrence  in  mul¬ 
tiple  samples  and  other  structural  characteristics  (Fig.  lb,  left,  and 
Supplementary  Methods).  The  classifier  demonstrated  a  sensitiv¬ 
ity  of  70.8%  and  specificity  of  88.3%  when  trained  using  transcripts 
that  overlapped  genes  in  the  AceView  database20,  including  11.7% 
of  unannotated  transcripts  that  were  classified  as  expressed  (Fig.  lb 
right).  We  then  clustered  the  expressed  transcripts  into  a  consen¬ 
sus  transcriptome  and  applied  additional  heuristic  filters  to  further 
refine  the  assembly  (Supplementary  Methods).  The  final  ab  initio 
transcriptome  assembly  yielded  35,415  distinct  transcriptional  loci 
(Supplementary  Table  2  and  Supplementary  Methods). 

Discovery  of  prostate  cancer  noncoding  RNAs 

We  compared  the  assembled  prostate  cancer  transcriptome  to  the 
UCSC,  Ensembl,  RefSeq,  Vega  and  ENCODE  gene  databases  to  iden¬ 
tify  and  categorize  transcripts  (Fig.  lc).  The  majority  of  the  transcripts 


(77.3%)  corresponded  to  annotated  protein  coding  genes  (72.1%) 
and  noncoding  RNAs  (5.2%),  but  a  substantial  percentage  (19.8%) 
lacked  any  overlap  and  were  designated  unannotated  (Fig.  2a). 
These  included  partially  intronic  antisense  (2.44%),  totally  intronic 
(12.1%)  and  intergenic  transcripts  (5.25%),  consistent  with  previ¬ 
ous  reports  of  unannotated  transcription21-23.  Because  of  the  added 
complexity  of  characterizing  antisense  or  partially  intronic  tran¬ 
scripts  without  strand-specific  RNA-Seq  libraries,  we  focused  on 
totally  intronic  and  intergenic  transcripts. 

Global  characterization  of  unannotated  intronic  and  intergenic 
transcripts  demonstrated  that  they  were  more  highly  expressed 
(Fig.  2b),  had  greater  overlap  with  expressed  sequence  tags  (ESTs) 
(Supplementary  Fig.  3)  and  displayed  a  clear  but  subtle  increase  in 
conservation  over  randomly  permuted  controls  (intergenic  transcripts 
P  =  2.7  x  10-4  ±  0.0002  for  0.4  <  co  <  0.8;  intronic  transcripts  P  =  2.6  x 
10-5  +  0.0017  for  0  <  co  <  0.4,  Fishers  exact  test,  Fig.  2c).  By  contrast, 
unannotated  transcripts  scored  lower  than  protein-coding  genes  for 
these  metrics,  which  corroborates  data  in  previous  reports2,24.  Notably, 
a  small  subset  of  unannotated  intronic  transcripts  showed  a  profound 
degree  of  conservation  (Fig.  2c,  inset).  Finally,  analysis  of  coding 
potential  revealed  that  only  5  of  6,144  transcripts  harbored  a  high- 
quality  open  reading  frame  (ORF),  indicating  that  the  vast  majority  of 
these  transcripts  represent  ncRNAs  (Supplementary  Fig.  4). 
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Figure  2  Prostate  cancer  transcriptome  sequencing  reveals  dysregulation  of  unannotated  transcripts,  (a)  Global  overview  of  transcription  in  prostate 
cancer.  The  pie  chart  on  the  left  displays  transcript  distribution  in  prostate  cancer.  The  pie  charts  on  the  right  display  unannotated  (upper)  or  annotated 
(lower)  ncRNAs  categorized  as  sense  transcripts  (intergenic  and  intronic)  and  antisense  transcripts,  respectively,  (b)  Line  graph  showing  that  unannotated 
transcripts  are  more  highly  expressed  (reads  per  kilobase  of  transcript  per  million  mapped  reads;  RPKM)  than  control  regions.  Negative  control  intervals 
were  generated  by  randomly  permuting  the  genomic  positions  of  the  transcripts,  (c)  Conservation  analysis  comparing  unannotated  transcripts  to  known 
genes  and  intronic  controls  shows  a  subtle  degree  of  purifying  selection  among  unannotated  transcripts.  The  inset  on  the  right  shows  an  enlarged  view, 
(d-g)  Intersection  plots  displaying  the  fraction  of  unannotated  transcripts  enriched  for  H3K4me2  (d),  H3K4me3  (e),  acetyl-H3  (f)  or  RNA  polymerase  II 
(g)  at  their  transcriptional  start  site  (TSS)  using  ChIP-Seq  and  RNA-Seq  data  for  the  VCaP  prostate  cancer  cell  line.  The  legend  applies  to  plots  in  b-g.  (h) 
A  pie  chart  displaying  the  distribution  of  differentially  expressed  transcripts  in  prostate  cancer  (FDR  <  0.01). 


To  determine  whether  our  unannotated  transcripts  were  sup¬ 
ported  by  histone  modifications  defining  active  transcriptional  units, 
we  used  published  prostate  cancer  chromatin  immunoprecipitation 
(ChlP)-Seq  data  for  two  prostate  cell  lines25,  VCaP  and  LNCaP 
(Supplementary  Table  3).  After  filtering  our  data  set  for  transcribed 
repetitive  elements  known  to  display  alternative  patterns  of  histone 
modifications26,  we  observed  a  strong  enrichment  for  histone  modi¬ 
fications  characterizing  transcriptional  start  sites  (TSSs)  and  active 
transcription,  including  H3K4me2,  H3K4me3,  acetyl-H3  and  RNA 
polymerase  II  (Fig.  2d-g),  but  not  H3K4mel,  which  characterizes 
enhancer  regions27  (Supplementary  Figs.  5  and  6).  Notably,  inter- 
jSw  genic  ncRNAs  showed  greater  enrichment  compared  to  intronic 
ncRNAs  in  these  analyses  (Fig.  2d-g). 

To  elucidate  global  changes  in  transcript  abundance  in  prostate 
cancer,  we  analyzed  differential  expression  for  all  transcripts.  We 
found  836  genes  differentially  expressed  between  benign  samples  and 
localized  tumors  (false-discovery  rate  (FDR)  <  0.01),  with  annotated 
protein-coding  and  ncRNA  genes  constituting  82.8%  and  7.4%  of 
differentially  expressed  genes,  respectively,  including  known  pros¬ 
tate  cancer  biomarkers  such  AMACR28,  HPN29  and  PC  A3  (ref.  14) 
(Fig.  2h,  Supplementary  Fig.  2  and  Supplementary  Table  4). 
Finally,  9.8%  of  differentially  expressed  genes  corresponded  to 
unannotated  ncRNAs,  including  3.2%  within  gene  introns  and  6.6%  in 
intergenic  regions. 

Characterization  of  PCATs 

As  ncRNAs  may  contribute  to  human  disease6-9,  we  identified  aber¬ 
rantly  expressed  uncharacterized  ncRNAs  in  prostate  cancer.  We 
found  a  total  of  1,859  unannotated  lincRNAs  throughout  the  human 
genome.  Overall,  these  intergenic  RNAs  resided  approximately  half¬ 
way  between  two  protein  coding  genes  (Supplementary  Fig.  7),  and 
over  one- third  (34.1%)  were  >10  kb  from  the  nearest  protein-coding 


gene,  which  is  consistent  with  previous  reports30  and  supports  the 
independence  of  intergenic  ncRNAs  genes.  For  example,  visualizing 
the  Chrl5q  arm  using  the  Circos  program  (http://circos.ca/)  illus¬ 
trated  genomic  positions  of  89  unannotated  intergenic  transcripts, 
including  one  differentially  expressed  gene  centromeric  to  TLE3 
(Supplementary  Fig.  8). 

A  focused  analysis  of  the  1,859  unannotated  intergenic  RNAs 
yielded  106  that  were  differentially  expressed  in  localized  tumors  (FDR 
<  0.05,  Fig.  3a).  A  cancer  outlier  expression  analysis  (Supplementary 
Methods)  similarly  nominated  numerous  unannotated  ncRNA  out¬ 
liers  (Fig.  3b)  as  well  as  known  prostate  cancer  outliers,  such  as 
ERG18,  ETV1  (refs.  17,18),  SPINK1  (ref.  31)  and  CRISP3  (ref.  32). 
Merging  these  results  produced  a  set  of  121  unannotated  transcripts 
that  accurately  discriminated  benign,  localized  tumor  and  metastatic 
prostate  samples  by  unsupervised  clustering  (Fig.  3a).  Indeed,  clus¬ 
tering  analyses  using  unannotated  ncRNA  outliers  also  suggested 
disease  subtypes  (Supplementary  Fig.  9).  These  121  unannotated 
transcripts  were  ranked  and  named  as  PCATs  according  to  their 
fold-change  in  localized  tumor  versus  benign  tissue  (Supplementary 
Tables  5-7). 

Validation  of  novel  ncRNAs 

To  gain  confidence  in  our  transcript  nominations,  we  validated  mul¬ 
tiple  unannotated  transcripts  in  vitro  by  reverse  transcription  PCR 
(RT-PCR)  and  quantitative  real-time  PCR  (qPCR)  (Supplementary 
Fig.  10).  qPCR  for  four  transcripts  ( PCAT-114 ,  PCAT-14,  PCAT-43 
and  PCAT-1)  on  two  independent  cohorts  of  prostate  tissues  con¬ 
firmed  predicted  cancer-specific  expression  patterns  (Fig.  3c-f  and 
Supplementary  Fig.  11).  Notably,  all  four  are  prostate-specific,  with 
minimal  expression  seen  by  qPCR  in  breast  ( n  =  14)  or  lung  cancer 
[n  -  16)  cell  lines  or  in  19  normal  tissue  types  (Supplementary  Table  8). 
This  is  further  supported  by  expression  analysis  of  these  transcripts  in 
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Figure  3  Unannotated  intergenic  transcripts 
differentiate  prostate  cancer  and  benign 
prostate  samples,  (a)  Unsupervised  clustering 
analyses  of  differentially  expressed  or  outlier 
unannotated  intergenic  transcripts  clusters 
benign  samples,  localized  tumors  and 
metastatic  cancers.  Expression  is  plotted 
as  log2  fold-change  relative  to  the  median 
of  the  benign  samples.  The  four  transcripts 
detailed  in  this  study  are  indicated  on  the 
side,  (b)  Cancer  outlier  expression  analysis 
for  the  prostate  cancer  transcriptome 
ranks  unannotated  transcripts  prominently. 

(c-f)  qPCR  on  an  independent  cohort  of 
prostate  and  nonprostate  samples  (benign 
(n  =  19),  PCA  (n  =  35),  metastatic  (MET) 

(/?  =  31),  prostate  cell  lines  (a?  =  7),  breast  cell 
lines  [n  =  14),  lung  cell  lines  (n  =  16),  other 
normal  samples  (/?  =  19);  Supplementary 
Table  8))  measures  expression  levels  of  four 
nominated  ncRNAs — PCAT-14  (c),  PCAT-43  (d), 
PCAT-1 14  (e),  PCAT-1  (f) — and  upregulated 
in  prostate  cancer.  Inset  tables  on  the  right 
quantify  ‘positive’  and  ‘negative’  expressing 
samples  using  the  cut-off  value  (shown  as  a 
black  dashed  lines).  Statistical  significance 
was  determined  using  a  Fisher’s  exact  test. 
qPCR  analysis  was  performed  by  normalizing 
to  GAPDH  and  the  median  expression  of  the 
benign  samples. 


our  RNA-Seq  compendium  of  13  tumor  types, 
representing  325  samples  (Supplementary 
Fig.  12).  This  tissue  specificity  was  not  neces¬ 
sarily  due  to  regulation  by  androgen  receptor 
signaling,  as  only  PCAT-14  expression  was 

induced  when  androgen  responsive  VCaP  and  LNCaP  cells  were 
treated  with  the  synthetic  androgen  R1881,  consistent  with  previous 
data  from  this  locus17  (Supplementary  Fig.  13).  PCAT-1  and  PCAT- 
14  also  showed  cancer- specific  upregulation  when  tested  on  a  panel  of 
matched  tumor-normal  pair  samples  (Supplementary  Fig.  14). 

Of  note,  PCAT-1 14,  which  ranks  as  the  fifth  best  outlier,  just  ahead 
of  ERG  (Fig.  3b  and  Supplementary  Table  7),  appears  as  part  of 
a  large,  >500  kb  locus  of  expression  in  a  gene  desert  in  Chr2q31. 
We  termed  this  region  second  chromosome  locus  associated  with 
prostate-1’  (SChLAPl)  (Supplementary  Fig.  15).  Careful  analysis  of 
the  SChLAPl  locus  revealed  both  discrete  transcripts  and  intronic 
transcription,  highlighting  this  region  as  an  intriguing  aspect  of  the 
prostate  cancer  transcriptome. 

PCAT-1 ,  an  unannotated  prostate  cancer  lincRNA 

To  explore  several  transcripts  more  closely,  we  carried  out  5'  and  3' 
rapid  amplification  of  cDNA  ends  (RACE)  for  PCAT-1  and  PCAT-14. 
Interestingly,  the  PCAT-14  locus  contained  components  of  viral  ORFs 
from  the  HERV-K  endogenous  retrovirus  family  (Supplementary 
Fig.  16),  whereas  PCAT-1  incorporates  portions  of  a  mariner  family 
transposase33,34,  an  Alu  and  a  viral  long  terminal  repeat  promoter 
region  (Fig.  4a  and  Supplementary  Fig.  17).  Whereas  PCAT-14  was 
upregulated  in  localized  prostate  cancer  but  largely  absent  in  metas- 
tases  (Fig.  3c),  PCAT-1  was  strikingly  upregulated  in  a  subset  of  meta¬ 
static  and  high-grade  localized  (Gleason  score  >7)  cancers  (Fig.  3f 
and  Supplementary  Fig.  11).  Because  of  this  notable  profile,  we 
hypothesized  that  PCAT-1  may  have  coordinated  expression  with  the 
oncoprotein  EZH2,  a  core  PRC2  protein  that  is  upregulated  in  solid 


C  Chr22q11.23 
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Outlier  score 


tumors  and  contributes  to  a  metastatic  phenotype35,36.  Surprisingly, 
we  found  that  PCAT-1  and  EZH2  expression  were  nearly  mutually 
exclusive  (Fig.  4b),  with  only  one  patient  showing  outlier  expression 
of  both.  This  suggests  that  outlier  PCAT-1  and  EZH2  expression  may 
define  two  subsets  of  high-grade  disease. 

PCAT-1  is  located  in  the  chromosome  8q24  gene  desert  -725  kb 
upstream  of  the  c-MYC  oncogene.  To  confirm  that  PCAT-1  is  a 
noncoding  gene,  we  cloned  the  full-length  PCAT-1  transcript  and 
performed  in  vitro  translational  assays,  which  were  negative  as  expected 
(Supplementary  Fig.  18).  Next,  because  Chr8q24  is  known  to  harbor 
prostate  cancer-associated  single  nucleotide  polymorphisms  (SNPs) 
and  to  exhibit  frequent  chromosomal  amplification37-42,  we  evaluated 
whether  the  relationship  between  EZH2  and  PCAT-1  was  specific  or 
generalized.  To  address  this,  we  measured  expression  levels  of  c-MYC 
and  NCOA2 ,  two  proposed  targets  of  Chr8q  amplification39,42,  by 
qPCR.  Neither  c-MYC  nor  NCOA2  levels  showed  striking  expression 
relationships  to  PCAT-1,  EZH2  or  each  other  (Supplementary  Fig.  19). 
Likewise,  PCAT-1  outlier  expression  was  not  dependent  on  Chr8q24 
amplification,  as  highly  expressing  localized  tumors  often  did  not  have 
8q24  amplification  and  high  copy  number  gain  of  8q24  was  not  suf¬ 
ficient  to  upregulate  PCAT-1  (Supplementary  Figs.  20  and  21). 

PCAT-1  function  and  regulation 

Despite  reports  showing  that  upregulation  of  the  ncRNA  HOTAIR 
participates  in  PRC2  function  in  breast  cancer9,  we  do  not  observe 
strong  expression  of  this  ncRNA  in  prostate  (Supplementary  Fig.  22), 
suggesting  that  other  ncRNAs  may  be  important  in  this  cancer.  To 
determine  the  mechanism  for  the  expression  profiles  of  PCAT-1  and 


4 


ADVANCE  ONLINE  PUBLICATION  NATURE  BIOTECHNOLOGY 


2011  Nature  America,  Inc.  All  rights  reserved. 


ARTICLES 


Figure  4  PCAT-1  is  a  marker  of  aggressive  cancer 
and  a  PRC2-repressed  ncRNA.  (a)  The  genomic 
location  of  PCAT-1  determined  by  5'  and  3'  RACE, 
with  DNA  sequence  features  indicated  by  the 
colored  boxes,  (b)  qPCR  for  PCAT-1  (yaxis)  and 
EZH2  (x axis)  on  a  cohort  of  benign  [n=  19), 
localized  tumor  (n  =  35)  and  metastatic  cancer 
(n  =  31)  samples.  The  inset  table  quantifies 
patient  subsets  demarcated  by  the  gray  dashed 
lines,  (c)  Knockdown  of  EZH2  in  VCaP  resulted 
in  upregulation  of  PCAT-1.  Data  were  normalized 
to  GAPDH  and  represented  as  fold-change. 

ERG  and  B-actin  serve  as  negative  controls.  The 
inset  western  blot  indicates  EZH2  knockdown. 

(d)  Treatment  of  VCaP  cel  Is  with  0. 1  gM  of  the 
EZH2  inhibitor  DZNep  or  vehicle  control  (DMSO) 
shows  increased  expression  of  PCAT-1  transcript 
after  EZH2  inhibition,  (e)  PCAT-1  expression 
is  increased  upon  treatment  of  VCaP  cells  with 
the  demethylating  agent  5'azacytidine  (5'Aza), 
the  histone  deacetylase  inhibitor  SAHA  or  a 
combination  of  both.  qPCR  data  were  normalized  to 
the  average  of  (GAPDH  +  (3-actin)  and  represented 
as  fold-change.  GSTP1  and  FKBP5  are  positive  and 
negative  controls,  respectively,  (f)  ChIP  assays  for 
SUZ12  demonstrated  direct  binding  of  SUZ12  to 
the  PCAT-1  promoter.  Primer  locations  are  indicated 
(boxed  numbers)  in  the  PCAT-1  schematic. 
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EZH2 ,  we  inhibited  EZH2  activity  in  VCaP  cells,  which  express  low- 
to-moderate  levels  of  PCAT-1.  Knockdown  of  EZH2  by  short  hairpin 
(sh)RNA  or  pharmacologic  inhibition  of  EZH2  with  the  inhibitor 
3-deazaneplanocin  A  (DZNep)  caused  a  dramatic  upregulation  in 
PCAT-1  expression  levels  (Fig.  4c, d),  as  did  treatment  of  VCaP  cells 
with  the  demethylating  agent  5'deoxyazacytidine,  the  histone  deacety¬ 
lase  inhibitor  SAHA  or  both  (Fig.  4e).  ChIP  assays  also  demonstrated 
that  SUZ12,  a  core  PRC2  protein,  directly  binds  the  PCAT-1  promoter 
~1  kb  upstream  of  the  TSS  (Fig.  4f).  Notably,  RNA  immunoprecipita- 
tion  similarly  showed  binding  of  PCAT-1  to  SUZ12  protein  in  VCaP 
cells  (Supplementary  Fig.  23a).  RNA  immunoprecipitation  assays 
followed  by  RNase  A,  RNase  H  or  DNase  I  treatment  either  abolished, 
partially  preserved  or  totally  preserved  this  interaction,  respectively 
(Supplementary  Fig.  23b).  This  suggests  that  PCAT-1  exists  primarily 
tew  as  a  single- stranded  RNA  and  secondarily  as  a  RNA/DNA  hybrid. 

To  explore  the  functional  role  of  PCAT-1  in  prostate  cancer,  we  stably 
overexpressed  full-length  PCAT-1  or  controls  in  RWPE  benign  immor¬ 
talized  prostate  cells.  We  observed  a  modest  but  consistent  increase 
in  cell  proliferation  when  PCAT-1  was  overexpressed  at  physiological 


levels  (Fig.  5a  and  Supplementary  Fig.  24).  Next,  we  designed  short 
interfering  (si)RNA  oligos  to  PCAT-1  and  performed  knockdown  exper¬ 
iments  in  LNCaP  cells,  which  express  higher  levels  of  PCAT-1  without 
PRC2-mediated  repression  (Supplementary  Fig.  25).  Supporting  our 
overexpression  data,  knockdown  of  PCAT-1  with  three  independent 
siRNA  oligos  resulted  in  a  25-50%  decrease  in  cell  proliferation  in 
LNCaP  cells  (Fig.  5b),  but  not  in  control  DU145  cells  lacking  PCAT-1 
expression  (Supplementary  Fig.  26)  or  VCaP  cells,  in  which  PCAT-1 
is  expressed  but  repressed  by  PRC2  (Supplementary  Fig.  27). 

Gene  expression  profiling  of  LNCaP  knockdown  samples  on 
cDNA  microarrays  indicated  that  PCAT-1  modulates  the  transcrip¬ 
tional  regulation  of  370  genes  (255  upregulated,  115  downregulated; 
FDR  <0.01)  (Supplementary  Fig.  28  and  Supplementary  Table  9). 
Gene  ontology  analysis  of  the  upregulated  genes  showed  preferen¬ 
tial  enrichment  for  gene  set  concepts  such  as  mitosis  and  cell  cycle, 
whereas  the  downregulated  genes  had  no  concepts  showing  statistical 
significance  (Fig.  5c  and  Supplementary  Table  10).  These  results 
suggest  that  the  function  of  PCAT-1  is  predominantly  repressive 
in  nature,  similar  to  other  lincRNAs.  We  next  validated  expression 
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Figure  5  PCAT-1  promotes  cell  proliferation,  (a)  Cell  proliferation  assays  for  RWPE  benign  immortalized  prostate  cells  stably  infected  with  PCAT-1 
lentivirus  or  RFP  and  LacZ  control  lentiviruses.  An  asterisk  (*)  indicates  P<  0.02  by  a  two-tailed  Student’s  Ctest.  (b)  Cell  proliferation  assays  in  LNCaP 
using  PCAT-1  siRNAs.  An  asterisk  (*)  indicates  P<  0.005  by  a  two-tailed  Student’s  t- test,  (c)  Gene  ontology  analysis  of  PCAT-1  knockdown  microarray 
data  using  the  DAVID  program.  Blue  bars  represent  the  top  hits  for  upregulated  genes.  Red  bars  represent  the  top  hits  for  downregulated  genes. 

DAVID  enrichment  scores  are  represented  with  Benjamini-Hochberg-adjusted  P  values.  All  error  bars  in  this  figure  are  mean  ±  s.e.m. 
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Figure  6  Prostate  cancer  tissues  recapitulate  PCAT-1  signaling,  (a)  qPCR  expression  of  three  PCAT-1  target  genes  after  PCAT-1  knockdown  in  VCaP  and 
LNCaP  cells,  as  well  as  following  EZH2  knockdown  or  dual  EZH2  and  PCAT-1  knockdown  in  VCaP  cells.  qPCR  data  were  normalized  to  the  average  of 
{GAPDH  +  (3-actin)  and  represented  as  fold  change.  Error  bars  represent  mean  ±  s.e.m.  (b)  Standardized  log2-transformed  qPCR  expression  of  a  set 
of  tumors  and  metastases  with  outlier  expression  of  either  PCAT-1  or  EZH2.  The  shaded  squares  in  the  lower  left  show  Spearman  correlation  values 
between  the  indicated  genes  (*  indicates  P<  0.05).  Blue  and  red  indicate  negative  or  positive  correlation,  respectively.  The  upper  squares  show  the 
scatter  plot  matrix  and  fitted  trend  lines  for  the  same  comparisons,  (c)  A  heatmap  of  PCAT-1  target  genes  U3RCA2,  CENPF,  CENPE)  in  EZf/Z-outlier 
and  PCAT-l-outlier  patient  samples  (see  Fig.  4b).  Expression  was  determined  by  qPCR  and  normalized  as  in  b.  (d)  A  predicted  network  generated  by 
the  HefaLMP  program  for  7  of  20  top  upregulated  genes  following  PCAT-1  knockdown  in  LNCaP  cells.  Gray  nodes  are  genes  found  following  PCAT-1 
knockdown.  Red  edges  indicate  co-expressed  genes;  black  edges  indicate  predicted  protein-protein  interactions;  and  purple  edges  indicate  verified 
protein-protein  interactions,  (e)  A  proposed  schematic  representing  PCAT-1  upregulation,  function  and  relationship  to  PRC2. 


changes  in  three  key  PCAT-1  target  genes  ( BRCA2 ,  CENPE  and 
CENPF)  whose  expression  is  upregulated  upon  PCAT-1  knockdown 
(Fig.  6a)  in  LNCaP  and  VCaP  cells,  the  latter  of  which  appear  less 
sensitive  to  PCAT-1  knockdown  likely  due  to  lower  overall  expression 
levels  of  this  transcript. 

PCAT-1  signatures  in  prostate  cancer 

Because  of  the  regulation  of  PCAT-1  by  PRC2  in  VCaP  cells,  we 
hypothesized  that  knockdown  of  EZH2  would  also  downregulate 
PCAT-1  targets  as  a  secondary  phenomenon  owing  to  the  subsequent 
upregulation  of  PCAT-1.  Simultaneous  knockdown  of  PCAT-1  and 
EZH2  would  thus  abrogate  expression  changes  in  PCAT-1  target 
genes.  Carrying  out  this  experiment  in  VCaP  cells  demonstrated  that 
PCAT-1  target  genes  were  indeed  downregulated  by  EZH2  knock¬ 
down,  and  that  this  change  was  either  partially  or  completely  reversed 
using  siRNA  oligos  to  PCAT-1  (Fig.  6a),  lending  support  to  the  role 
of  PCAT-1  as  a  transcriptional  repressor.  Taken  together,  these  results 
suggest  that  PCAT-1  biology  may  exhibit  two  distinct  modalities:  one 
in  which  PRC2  represses  PCAT-1  and  a  second  in  which  active  PCAT-1 
promotes  cell  proliferation.  PCAT-1  and  PRC2  may  therefore  charac¬ 
terize  distinct  subsets  of  prostate  cancer. 

To  examine  these  findings,  we  used  qPCR  to  measure  expres¬ 
sion  of  BRCA2 ,  CENPE  and  CENPF  in  our  cohort  of  tissue  samples. 
Consistent  with  our  model,  we  found  that  samples  expressing  PCAT-1 
tended  to  have  low  expression  of  PCAT-1  target  genes  (Fig.  6b). 


Moreover,  comparing  EZH2- outlier  and  PCAT-1  -outlier  patients 
(Fig.  4b),  we  found  that  two  distinct  phenotypes  emerged.  Individuals 
with  high  EZH2  tended  to  have  high  levels  of  PCAT-1  target  genes, 
and  those  with  high  expression  of  PCAT-1  itself  displayed  the  opposite 
expression  pattern  of  target  genes  (Fig.  6c).  Network  analysis  of  the 
top  20  upregulated  genes  after  PCAT-1  knockdown  with  the  HefaLMP 
tool43  further  suggested  that  these  genes  form  a  coordinated  network 
(Fig.  6d),  corroborating  our  previous  observations.  Taken  together, 
these  results  provide  initial  data  into  the  composition  and  function 
of  the  prostate  cancer  ncRNA  transcriptome. 

DISCUSSION 

To  our  knowledge,  this  study  represents  the  largest  RNA-Seq  analysis 
to  date  and  the  first  to  comprehensively  analyze  a  common  epithelial 
cancer  from  a  large  cohort  of  human  tissue  samples.  As  such,  our  study 
has  adapted  existing  computational  tools  intended  for  small-scale  use3 
and  developed  new  methods  to  distill  large  numbers  of  transcrip¬ 
tome  data  sets  into  a  single  consensus  transcriptome  assembly  that 
accurately  represents  disease  biology  (Supplementary  Discussion). 

Among  the  numerous  uncharacterized  ncRNA  species  detected 
by  our  study,  we  have  focused  on  121  PCATs,  which  we  believe  rep¬ 
resent  a  set  of  uncharacterized  ncRNAs  that  may  have  important 
biological  functions  in  this  disease.  In  this  regard,  these  data  con¬ 
tribute  to  a  growing  body  of  literature  supporting  the  importance  of 
unannotated  ncRNA  species  in  cellular  biology  and  oncogenesis6-12, 
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and  broadly  our  study  confirms  the  utility  of  RNA-Seq  in  defining 
functionally  important  elements  of  the  genome2-4. 

Of  particular  interest  is  our  discovery  of  the  prostate-specific 
ncRNA  gene  PCAT-1 ,  which  is  markedly  overexpressed  in  a  subset 
of  prostate  cancers,  particularly  metastases,  and  may  contribute  to  cell 
proliferation  in  these  tumors.  It  is  also  notable  that  PCAT-1  resides  in 
the  8q24  gene  desert’  locus,  in  the  vicinity  of  well-studied  prostate 
cancer  risk  SNPs  and  the  c-MYC  oncogene,  suggesting  that  this  locus — 
and  its  frequent  amplification  in  cancer — may  be  linked  to  additional 
aspects  of  cancer  biology  (Supplementary  Discussion).  In  addition, 
the  interplay  between  PRC2  and  PCAT-1  further  suggests  that  this 
ncRNA  may  have  an  important  role  in  prostate  cancer  progression 
(Fig.  6e).  Other  ncRNAs  identified  by  this  analysis  may  similarly 
contribute  to  prostate  cancer  as  well.  Furthermore,  recent  preclinical 
efforts  to  detect  prostate  cancer  noninvasively  through  the  collection 
of  patient  urine  samples  have  shown  promise  for  several  urine-based 
prostate  cancer  biomarkers,  including  the  ncRNA  PC  A3  (refs.  44,45). 
Although  additional  studies  are  needed,  our  identification  of  ncRNA 
biomarkers  for  prostate  cancer  suggests  that  urine-based  assays  for 
these  ncRNAs  may  also  warrant  investigation,  particularly  for  those 
that  may  stratify  patient  molecular  subtypes. 

Our  findings  support  an  important  role  for  tissue-specific  ncRNAs 
in  prostate  cancer  and  suggest  that  cancer-specific  functions  of  these 
ncRNAs  may  help  to  drive  tumorigenesis.  We  further  speculate  that 
specific  ncRNA  signatures  may  occur  universally  in  all  disease  states 
and  that  applying  these  methodologies  to  other  diseases  may  reveal 
key  aspects  of  disease  biology  and  clinically  important  biomarkers. 

METHODS 

Methods  and  any  associated  references  are  available  in  the  online  version 
of  the  paper  at  http://www.nature.com/naturebiotechnology/. 

Accession  codes.  Data  from  RNA-Seq  experiments  are  deposited 
at  the  NCBI  Gene  Expression  Omnibus  as  GSE25183.  PCAT-1  and 
PC  AT- 14  nucleotide  sequences  are  deposited  at  GenBank  nucleotide 
database  (nuccore)  as  HQ605084  and  HQ605085,  respectively. 

Note:  Supplementary  information  is  available  on  the  Nature  Biotechnology  website. 
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ONLINE  METHODS 

Cell  lines,  treatments  and  tissues.  All  prostate  cell  lines  were  obtained  from 
the  American  Type  Culture  Collection,  except  for  PrEC  (benign  nonimmor- 
talized  prostate  epithelial  cells)  and  PrSMC  (prostate  smooth  muscle  cells), 
which  were  obtained  from  Lonza.  Cell  lines  were  maintained  using  standard 
media  and  conditions. 

For  androgen  treatment  experiments,  LNCaP  and  VCaP  cells  were  grown  in 
androgen- depleted  media  for  48  h  and  subsequently  treated  with  5nM  methyl- 
trienolone  (R1881,  NEN  Life  Science  Products)  or  an  equivalent  volume  of 
ethanol  for  48  h  before  harvesting  the  cells.  For  drug  treatments,  VCaP  cells 
were  treated  with  20  jlM  5'deoxyazacytidine  (Sigma),  500  nM  HDAC  inhibitor 
suberoylanilide  hydroxamic  acid  (SAHA)  (Biovision),  or  both  5'deoxyazacyti- 
dine  and  SAHA.  5'deoxyazacytidine  treatments  were  performed  for  6  d  with 
media  and  drug  reapplied  every  48  h.  SAHA  treatments  were  done  for  48  h. 
DMSO  treatments  were  done  for  6  d.  For  DZNep  treatments,  DZNep  was  dis¬ 
solved  in  DMSO  and  VCaP  cells  were  treated  with  either  0. 1  juM  of  DZNep  or 
vehicle  control;  RNA  was  harvested  at  72  h  and  144  h. 

Prostate  tissues  were  obtained  from  the  radical  prostatectomy  series  and 
Rapid  Autopsy  Program_ENREF_48  at  the  University  of  Michigan  tissue  core 
as  part  of  the  University  of  Michigan  Prostate  Cancer  Specialized  Program 
of  Research  Excellence  (S.P.O.R.E.).  All  tissue  samples  were  collected  with 
informed  consent  under  an  Institutional  Review  Board  (IRB)  approved  pro¬ 
tocol  at  the  University  of  Michigan. 

RNA  isolation,  cDNA  synthesis  and  PCR  experiments.  Total  RNA  was 
isolated  using  Trizol  and  an  RNeasy  Kit  (Invitrogen)  with  DNase  I  digestion 
according  to  the  manufacturer’s  instructions.  RNA  integrity  was  verified  on 
an  Agilent  Bioanalyzer  2100  (Agilent  Technologies).  cDNA  was  synthesized 
from  total  RNA  using  Superscript  III  (Invitrogen)  and  random  primers 
(Invitrogen).  Quantitative  Real-time  PCR  (qPCR)  was  done  using  Power 
SYBR  Green  Mastermix  (Applied  Biosystems)  on  an  Applied  Biosystems 
7900HT  Real-Time  PCR  System.  (RT-PCR  was  done  with  Platinum  Taq  High 
Fidelity  polymerase  (Invitrogen).  All  oligonucleotide  primers  are  listed  in 
Supplementary  Table  11.  For  PCR  product  sequencing,  PCR  products  were 
resolved  on  a  1.5%  agarose  gel,  and  either  sequenced  directly  or  extracted 
using  a  Gel  Extraction  kit  (Qiagen)  and  cloned  into  pcr4-TOPO  vectors 
(Invitrogen).  PCR  products  were  bidirectionally  sequenced  at  the  University 
of  Michigan  Sequencing  Core. 

RNA-ligase-mediated  rapid  amplification  of  cDNA  ends  (RACE).  5'  and 

3'  RACE  was  performed  using  the  GeneRacer  RLM-RACE  kit  (Invitrogen) 
according  to  the  manufacturer’s  instructions.  RACE  PCR  products  were 
obtained  using  Platinum  Taq  high-fidelity  polymerase  (Invitrogen),  the  sup¬ 
plied  GeneRacer  primers,  and  appropriate  gene- specific  primers  indicated  in 

Supplementary  Table  11. 

RNA-Seq  library  preparation.  2  jig  total  RNA  was  selected  for  polyA+  RNA 
using  Sera-Mag  oligo(dT)  beads  (Thermo  Scientific),  and  paired-end  next- 
generation  sequencing  libraries  were  prepared,  as  previously  described46, 
using  Illumina- supplied  universal  adaptor  oligos  and  PCR  primers  (Illumina). 
Samples  were  sequenced  in  a  single  lane  on  an  Illumina  Genome  Analyzer  I  or 
Genome  Analyzer  II  flow  cell  using  previously  described  protocols46.  36-45  mer 
paired-end  reads  were  done  according  to  the  protocol  provided  by  Illumina. 

Overexpression  studies.  PCAT-1  full-length  transcript  was  cloned  into  the 
pLenti6  vector  (Invitrogen)  along  with  RFP  and  LacZ  controls.  After  confir¬ 
mation  of  the  insert  sequence,  lentiviruses  were  generated  at  the  University  of 
Michigan  Vector  Core  and  transfected  into  the  benign  immortalized  prostate 
cell  line  RWPE.  RWPE  cells  stably  expressing  PCAT-1,  RFP  or  LacZ  were  gen¬ 
erated  by  selection  with  blasticidin  (Invitrogen),  and  10,000  cells  were  plated 
into  12-well  plates.  Cells  were  harvested  and  counted  at  day  2,  day  4  and  day 
6  post-plating  with  a  Coulter  counter. 


siRNA  knockdown  studies.  Cells  were  plated  and  transfected  with  20  JlM 
experimental  siRNA  oligos  or  nontargeting  controls  twice,  at  12  h  and  36  h 
post-plating.  Knockdowns  were  performed  with  Oligofectamine  in  OptiMEM 
media.  Knockdown  efficiency  was  determined  by  qPCR.  siRNA  sequences 
(in  sense  format)  for  PCAT-1  knockdown  were  as  follows:  siRNA  1  UU 
AAAGAGAUCCACAGUUAUU;  siRNA  2  GCAGAAACACCAAUGGAUA 
UU;  siRNA  3  AUACAUAAGACC AUGGAA AU ;  siRNA  4  GAACCUAACUGG 
ACUUUAAUU.  For  EZH2  siRNA,  the  following  sequence  was  used:  GAGG 
UUCAGACGAGCUGAUUU. 

shRNA  knockdown  and  western  blot  analysis.  Cells  were  seeded  at  50-60% 
confluency,  incubated  overnight,  and  transfected  with  EZH2  or  nontargeting 
shRNA  lentiviral  constructs  as  described  in  for  48  h.  GFP+  cells  were  drug- 
selected  using  1  Jlg/ml  puromycin.  RNA  and  protein  were  harvested  for  PCR 
and  western  blot  analysis  according  to  standard  protocols.  For  western  blot 
analysis,  PVDF  membranes  (GE  Healthcare)  were  incubated  overnight  at  4  °C 
with  either  EZH2  mouse  monoclonal  (1:1,000,  BD  Biosciences,  no.  612666), 
or  B-actin  (Abeam,  ab8226)  for  equal  loading. 

Gene  expression  profiling.  Agilent  Whole  Human  Genome  Oligo  Microarray 
was  used  for  cDNA  profiling  of  PCAT-1  siRNA  knockdown  samples  or  nontar¬ 
geting  control  according  to  standard  protocols_ENREF_50.  All  samples  were 
run  in  technical  triplicates  against  nontargeting  control  siRNA.  Expression 
array  data  was  processed  using  the  SAM  method47  with  an  FDR  <0.01.  Up- 
and  downregulated  probes  were  separated  and  analyzed  using  the  DAVID 
bioinformatics  platform48. 

ChIP.  Assays  were  done  as  previously  described25,  where  4-7  jig  of  the 
following  antibodies  were  used:  IgG  (Millipore,  PP64),  SUZ12  (Cell  Signaling, 
no.  3737)  and  SUZ12  (Abeam,  ab  12073).  ChIP-PCR  reactions  were  done  in 
triplicate  with  SYBRGreen  using  1:150th  of  the  ChIP  product  per  reaction. 

In  vitro  translation.  Full-length  PCAT-1,  Halo-tagged  ERG  or  GUS  positive 
control  were  cloned  into  the  PCR2.1  entry  vector  (Invitrogen)  and  in  vitro 
translational  assays  were  done  using  the  TnT  Quick  Coupled  Transcription/ 
Translation  System  (Promega)  with  1  mM  methionine  and  Transcend  Biotin- 
Lysyl-tRNA  (Promega)  according  to  the  manufacturer’s  instructions. 

Bioinformatic  analyses.  Sequencing  reads  were  aligned  with  TopHat19,  and 
ab  initio  assembly  was  performed  with  Cufflinks3.  Transcriptome  librar¬ 
ies  were  merged  and  statistical  classifiers  were  developed  and  employed  to 
filter  low-confidence  transcripts.  Nominated  transcripts  were  compared  to 
UCSC,  RefSeq,  Vega,  Ensembl  and  ENCODE  database,  and  coding  poten¬ 
tial  was  determined  with  the  txCdsPredict  program  from  UCSC.  Transcript 
conservation  was  determined  with  the  SiPhy  package.  Differential  expression 
analysis  was  performed  using  SAM  methodology,  and  outlier  analysis  using  a 
modified  COPA  method.  See  the  Supplementary  Methods  for  details  on  the 
bioinformatics  methods  used. 

Statistical  analyses  for  experimental  studies.  All  data  are  presented  as 
means  ±  s.e.m.  All  experimental  assays  were  performed  in  duplicate  or 
triplicate.  Statistical  analyses  shown  in  figures  represent  Fisher’s  exact  tests 
or  two-tailed  Student  f-tests,  as  indicated.  For  details  regarding  the  statis¬ 
tical  methods  employed  during  RNA-Seq  and  ChIP-Seq  data  analysis,  see 
Supplementary  Methods. 
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